Goto

Collaborating Authors

 graph transformer



Leveraging Contrastive Learning for Enhanced Node Representations in Tokenized Graph Transformers

Neural Information Processing Systems

While tokenized graph Transformers have demonstrated strong performance in node classification tasks, their reliance on a limited subset of nodes with high similarity scores for constructing token sequences overlooks valuable information from other nodes, hindering their ability to fully harness graph information for learning optimal node representations. To address this limitation, we propose a novel graph Transformer called GCFormer. Unlike previous approaches, GCFormer develops a hybrid token generator to create two types of token sequences, positive and negative, to capture diverse graph information. And a tailored Transformer-based backbone is adopted to learn meaningful node representations from these generated token sequences. Additionally, GCFormer introduces contrastive learning to extract valuable information from both positive and negative token sequences, enhancing the quality of learned node representations. Extensive experimental results across various datasets, including homophily and heterophily graphs, demonstrate the superiority of GCFormer in node classification, when compared to representative graph neural networks (GNNs) and graph Transformers.


Gaussian Process Limit Reveals Structural Benefits of Graph Transformers

arXiv.org Machine Learning

Graph transformers are the state-of-the-art for learning from graph-structured data and are empirically known to avoid several pitfalls of message-passing architectures. However, there is limited theoretical analysis on why these models perform well in practice. In this work, we prove that attention-based architectures have structural benefits over graph convolutional networks in the context of node-level prediction tasks. Specifically, we study the neural network gaussian process limits of graph transformers (GAT, Graphormer, Specformer) with infinite width and infinite heads, and derive the node-level and edge-level kernels across the layers. Our results characterise how the node features and the graph structure propagate through the graph attention layers. As a specific example, we prove that graph transformers structurally preserve community information and maintain discriminative node representations even in deep layers, thereby preventing oversmoothing. We provide empirical evidence on synthetic and real-world graphs that validate our theoretical insights, such as integrating informative priors and positional encoding can improve performance of deep graph transformers.


Supra-Laplacian Encoding for Transformer on Dynamic Graphs

Neural Information Processing Systems

Fully connected Graph Transformers (GT) have rapidly become prominent in the static graph community as an alternative to Message-Passing models, which suffer from a lack of expressivity, oversquashing, and under-reaching.However, in a dynamic context, by interconnecting all nodes at multiple snapshots with self-attention,GT loose both structural and temporal information. In this work, we introduce Supra-LAplacian encoding for spatio-temporal TransformErs (SLATE), a new spatio-temporal encoding to leverage the GT architecture while keeping spatio-temporal information.Specifically, we transform Discrete Time Dynamic Graphs into multi-layer graphs and take advantage of the spectral properties of their associated supra-Laplacian matrix.Our second contribution explicitly model nodes' pairwise relationships with a cross-attention mechanism, providing an accurate edge representation for dynamic link prediction.SLATE outperforms numerous state-of-the-art methods based on Message-Passing Graph Neural Networks combined with recurrent models (e.g, LSTM), and Dynamic Graph Transformers,on~9 datasets. Code is open-source and available at this link https://github.com/ykrmm/SLATE.




Unifying Generation and Prediction on Graphs with Latent Graph Diffusion Cai Zhou

Neural Information Processing Systems

However, compared with the huge success of generative models in natural language processing [Tou-vron et al., 2023] and computer vision [Rombach et al., 2021], graph generation is faced with many




b4fd1d2cb085390fbbadae65e07876a7-Supplemental.pdf

Neural Information Processing Systems

The formulation is very similar to the method for learning positional node embeddings. Asynthetic molecular graph regression dataset, where thepredictedscore isgivenby the subtraction of computationally estimated propertieslogP SA. Thetask is to classify the nodes into 2 communities, testing the GNNs ability to recognize predetermined subgraphs. For the training parameters, we employed an Adam optimizer with alearning rate decay strategy initializedin{10 3,10 4}asper[15],withsomeminormodifications: ZINC[15]. We selected aninitial learning rateof7 10 4 and increased thepatiencefrom 10 to 25 to ensure convergence.